feat: add Sandbox abstraction for agent code execution environments#1968
feat: add Sandbox abstraction for agent code execution environments#1968agent-of-mkmeral wants to merge 1 commit intostrands-agents:mainfrom
Conversation
Add the Sandbox interface that decouples tool logic from where code runs. Tools that need to execute code or access a filesystem receive a Sandbox instead of managing their own execution, enabling portability across local, Docker, and cloud environments. Core components: - Sandbox ABC with streaming AsyncGenerator interface (base.py) - LocalSandbox for host-process execution via asyncio subprocesses (local.py) - DockerSandbox for containerized execution via docker exec (docker.py) - Agent integration: sandbox parameter on Agent.__init__, defaults to LocalSandbox Key design decisions: - Only core abstractions in SDK; AgentCoreSandbox and sandbox tools belong in separate packages (external dependencies, different release cycles) - Streaming output via AsyncGenerator[str | ExecutionResult] - yields lines as they arrive, ExecutionResult as the final yield - Security: randomized heredoc delimiters, shlex.quote for all paths, stdin piping in DockerSandbox to prevent injection - Auto-start lifecycle: sandbox starts on first execute() call - Zero external dependencies for core sandbox package Tests: 76 new tests (base: 22, local: 17, docker: 18, agent: 6, + shared) All 76 sandbox tests passing, 1686 existing tests still passing.
agent-of-mkmeral
left a comment
There was a problem hiding this comment.
Adversarial Testing Result: FAIL — 3 bugs found (1 security, 1 crash, 1 code correctness)
Scope: Full adversarial testing of Sandbox abstraction (base.py, local.py, docker.py, agent integration). Tested shell injection, path traversal, heredoc injection, concurrency, lifecycle edge cases, timeout cleanup, type safety, and contract verification.
Tests written: 67 (61 adversarial + 6 confirmation)
Tests passing: 63
Tests failing (findings): 4 (3 genuine bugs, 1 test infrastructure)
Findings Summary
| # | Category | Severity | Description |
|---|---|---|---|
| 1 | Security Issue | Critical | execute_code() language parameter injection — unsanitized |
| 2 | Bug | High | Long output lines (>64KB) crash readline() with ValueError |
| 3 | Bug | Medium | Non-existent working directory raises unhandled FileNotFoundError |
| 4 | Code Smell | Low | proc.returncode or 0 masks None as 0 (false success) |
Finding 1: execute_code() language parameter injection
Category: Security Issue
Severity: Critical
Reproduction:
sandbox = LocalSandbox(working_dir="/tmp")
marker = "/tmp/injection_marker.txt"
malicious_language = f"touch {marker} #"
await sandbox._execute_code_to_result("print('safe')", language=malicious_language)
assert os.path.exists(marker) # True — injection succeeded!Observed behavior: The language parameter is interpolated directly into the shell command without sanitization: f"{language} -c {shlex.quote(code)}". Code is properly shlex.quote()'d, but language is NOT. Any caller passing untrusted input as language enables arbitrary command execution.
Expected behavior: language should be validated against an allowlist (e.g., python, ruby, node) or at minimum shlex.quote()'d.
Fix: Either shlex.quote(language) or validate against an allowlist:
# Option A: Quote it
async for chunk in self.execute(f"{shlex.quote(language)} -c {shlex.quote(code)}", timeout=timeout):
yield chunk
# Option B: Allowlist (stricter)
ALLOWED_LANGUAGES = {"python", "python3", "ruby", "node", "bash", "sh"}
if language not in ALLOWED_LANGUAGES:
raise ValueError(f"Unsupported language: {language}")Finding 2: Long output lines crash readline()
Category: Bug
Severity: High
Reproduction:
sandbox = LocalSandbox(working_dir="/tmp")
await sandbox._execute_to_result("echo " + "A" * 100000)
# Raises: ValueError: Separator is found, but chunk is longer than limitObserved behavior: asyncio.StreamReader.readline() has a default buffer limit of 64KB. When a command outputs a single line exceeding this, readline() raises ValueError('Separator is found, but chunk is longer than limit'). This propagates as an unhandled exception.
Expected behavior: The sandbox should handle long output lines gracefully. Either increase the buffer limit when creating the subprocess, or catch ValueError/LimitOverrunError in _read_stream.
Fix: Set a larger limit when creating the subprocess:
proc = await asyncio.create_subprocess_shell(
command,
cwd=self.working_dir,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
limit=1024 * 1024, # 1MB line limit
)Note: asyncio.create_subprocess_shell accepts a limit parameter that's passed to the StreamReader.
Finding 3: Non-existent working directory crashes
Category: Bug
Severity: Medium
Reproduction:
sandbox = LocalSandbox(working_dir="/nonexistent/path")
await sandbox._execute_to_result("echo hello")
# Raises: FileNotFoundError: [Errno 2] No such file or directory: '/nonexistent/path'Observed behavior: asyncio.create_subprocess_shell(cwd="/nonexistent") raises FileNotFoundError which propagates unhandled.
Expected behavior: Either validate the working directory in __init__/start(), or catch the exception in execute() and return an ExecutionResult with a non-zero exit code and descriptive stderr.
Finding 4: proc.returncode or 0 masks None as 0
Category: Code Smell
Severity: Low
Location: local.py:115, docker.py:131, docker.py:262
Issue: The pattern proc.returncode or 0 uses Python's falsy semantics. If returncode is None (process not finished), it maps to 0 (success). The correct pattern is 0 if proc.returncode is None else proc.returncode.
Risk: Low in practice (after proc.wait(), returncode should never be None), but the code is semantically incorrect and fragile.
Other observations (non-findings)
✅ Heredoc injection: Base class write_file uses randomized STRANDS_EOF_{hex} delimiter — resistant to injection. LocalSandbox overrides with native file I/O (even better).
✅ DockerSandbox stdin piping: write_file uses stdin pipe instead of heredoc — immune to shell metacharacter injection.
✅ shlex.quote() on file paths: All path arguments are properly quoted.
✅ Concurrent execution: Multiple concurrent execute() calls work correctly.
✅ Context manager cleanup: Exception during async with properly calls stop().
✅ Docker container cleanup: stop() handles rm failures gracefully.
✅ Timeout process killing: Local and Docker sandboxes properly kill processes on timeout.
write_file("../escape.txt") writes outside working directory. This is by design for LocalSandbox (host execution), but should be clearly documented as a security boundary. DockerSandbox containment handles this via container isolation.
Artifacts: All adversarial tests provided in the review above.
🤖 AI agent response. Strands Agents. Feedback welcome!
Summary
Add the Sandbox interface that decouples tool logic from where code runs. Tools that need to execute code or access a filesystem receive a Sandbox instead of managing their own execution, enabling portability across local, Docker, and cloud environments.
Related: Design Doc PR #681
What Changed
New Files (4 source + 5 test)
src/strands/sandbox/__init__.pysrc/strands/sandbox/base.pySandboxABC with streamingAsyncGeneratorinterface,ExecutionResult, convenience methods (read_file,write_file,list_files,execute_code), auto-start lifecyclesrc/strands/sandbox/local.pyLocalSandbox— native file I/O overrides, asyncio subprocess executionsrc/strands/sandbox/docker.pyDockerSandbox— container lifecycle, stdin piping for safe file writes,docker execstreamingtests/strands/sandbox/test_base.pytests/strands/sandbox/test_local.pytests/strands/sandbox/test_docker.pytests/strands/sandbox/test_agent_sandbox.pyModified Files (2)
src/strands/__init__.pySandbox,LocalSandbox,DockerSandbox,ExecutionResultsrc/strands/agent/agent.pysandboxparameter toAgent.__init__, defaults toLocalSandbox()Design Decisions
✅ IN core SDK
SandboxABC — the interface contractLocalSandbox— zero-dependency host execution (default)DockerSandbox— containerized executionExecutionResultdataclasssandboxparameter)❌ NOT in core SDK (for now)
AgentCoreSandbox— External dependency onbedrock-agentcorepackage. Should live in a separate packagerun_command,python_tool,programmatic_tool_caller) — These are tools, not core abstractions. They belong instrands-agents/toolsCodeActPlugin— P2 in the design doc, not yet readyRationale: The core SDK should provide the minimal, zero-external-dependency abstraction. Tools and cloud-specific sandbox implementations have their own dependency graphs and release cycles.
Key Design Details
Streaming Interface
Agent Integration
Security
secrets.token_hex(8)delimitershlex.quote()for all pathsshlex.quote()for code argumentsTests
What's Left (Outside This PR)
AgentCoreSandbox— needs its own package withbedrock-agentcoredependencystrands-agents/toolsCodeActPlugin— P2 per design doccc @mkmeral